Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering
نویسندگان
چکیده
Matrix decomposition is one of the fundamental tools to discover knowledge from big data generated by modern applications. However, it still inefficient or infeasible process very using such a method in single machine. Moreover, are often distributedly collected and stored on different machines. Thus, generally bear strong heterogeneous noise. It essential useful develop distributed matrix for analytics. Such should scale up well, model noise, address communication issue system. To this end, we propose Bayesian (DBMD) mining clustering. Specifically, adopt three strategies implement computing including 1) accelerated gradient descent, 2) alternating direction multipliers (ADMM), 3) statistical inference. We investigate theoretical convergence behaviors these algorithms. heterogeneity an optimal plug-in weighted average that reduces variance estimation. Synthetic experiments validate our results, real-world show algorithms well achieves superior competing performance compared two typical methods Scalable-NMF scalable k-means++.
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملthe clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
An Ensemble Clustering for Mining High-dimensional Biological Big Data
Clustering of high-dimensional biological big data is incredibly difficult and challenging task, as the data space is often too big and too messy. The conventional clustering methods can be inefficient and ineffective on high-dimensional biological big data, because traditional distance measures may be dominated by the noise in many dimensions. An additional challenge in biological big data is ...
متن کاملHigh Performance clustering for Big Data Mining using Hadoop
Now a day, organizations across public and private sectors have made a premeditated decision to big data into competitive advantage. The motivation and challenge of extracting value from big data is similar in many ways to the age-old problem of distilling business intelligence from transactional data. Hadoop is a speedily budding ecosystem of components based on big data Map Reduce algorithm a...
متن کاملFast Kernel Matrix Computation for Big Data Clustering
Kernel k-Means is a basis for many state of the art global clustering approaches. When the number of samples grows too big, however, it is extremely time-consuming to compute the entire kernel matrix and it is impossible to store it in the memory of a single computer. The algorithm of Approximate Kernel k-Means has been proposed, which works using only a small part of the kernel matrix. The com...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2022
ISSN: ['1558-2191', '1041-4347', '2326-3865']
DOI: https://doi.org/10.1109/tkde.2020.3029582